Risk-Averse Allocation Indices for Multiarmed Bandit Problem

نویسندگان

چکیده

In classical multiarmed bandit problem, the aim is to find a policy maximizing expected total reward, implicitly assuming that decision-maker risk-neutral. On other hand, decision-makers are risk-averse in some real-life applications. this article, we design new setting based on concept of dynamic risk measures where with best risk-adjusted discounted outcome. We provide theoretical analysis problem respect novel and propose priority-index heuristic which gives risk-averse allocation indices having structure similar Gittins index. Although an optimal shown not always have index-based form, empirical results express excellence show can achieve or near-optimal interpretable policies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Irrevocable Multiarmed Bandit Problem

This paper considers the multi-armed bandit problem with multiple simultaneous arm pulls and the additional restriction that we do not allow recourse to arms that were pulled at some point in the past but then discarded. This additional restriction is highly desirable from an operational perspective and we refer to this problem as the ‘Irrevocable Multi-Armed Bandit’ problem. We observe that na...

متن کامل

The Nonstochastic Multiarmed Bandit Problem

In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff...

متن کامل

A Lemma on the Multiarmed Bandit Problem

We prove a lemma on the optimal value function for the mdtiarmed bandit problem which provides a simple direct proof of optimality of writeoff policies. This, in turn, leads to a new proof of optimality of the index rule.

متن کامل

Asymptotically Efficient Allocation Rules for the - Multiarmed Bandit Problem with Multiple Plays - Part 11 : Markovian Rewards

At each instant of lime we are required to sample a fixed number rn 2 1 out of N Markov chains whose stationary transition probability matrices belong to a family suitably parameterized by a real number 8. The objective is to maximize the long run expected value of the samples. The learning loss of a sampling scheme corresponding to a parameters configuration C = (el,. .. , e, %*) is quantified...

متن کامل

Finite-Time Regret Bounds for the Multiarmed Bandit Problem

We show finite-time regret bounds for the multiarmed bandit problem under the assumption that all rewards come from a bounded and fixed range. Our regret bounds after any number T of pulls are of the form a+b logT+c log2 T , where a, b, and c are positive constants not depending on T . These bounds are shown to hold for variants of the popular "-greedy and Boltzmann allocation rules, and for a ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Automatic Control

سال: 2021

ISSN: ['0018-9286', '1558-2523', '2334-3303']

DOI: https://doi.org/10.1109/tac.2021.3053539